We're going to look at shootings and homicides from the Tribune's internal database and group them by community area. This expands on the example in the "Shootings and homicides within the Austin community area" notebook because it gets data for all community areas and uses a spatial index so we don't have to loop through all the community areas for each incident.

First, we need to get the community area boundaries.



In [1]:

    
import requests

def get_chicago_community_areas():
    url = 'https://data.cityofchicago.org/api/geospatial/cauq-8yn6?method=export&format=GeoJSON'
    resp = requests.get(url, verify=False)
    return resp.json()

community_areas = get_chicago_community_areas()









    



/Users/ghing/venvs/public-notebooks/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py:791: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
  InsecureRequestWarning)

Now, let's convert the GeoJSON dicts to shapes that we can use to look up which community area a shooting or homicide is in



In [2]:

    
from shapely.geometry import shape
# Get the shapes as a map between community area number and shape as we'll need the IDs anyway to build our index later
community_area_shapes = {int(f['properties']['area_num_1']): shape(f['geometry']) for f in community_areas['features']}
community_area_properties = {int(f['properties']['area_num_1']): f['properties'] for f in community_areas['features']}

Build a spatial index of community areas



In [3]:

    
from rtree import index

communty_area_index = index.Index()
for ca_number, ca_shape in community_area_shapes.items():
    communty_area_index.add(ca_number, ca_shape.bounds, obj=community_area_properties[ca_number])

Let's spot check our index, just because the coordinate format, (left, bottom, right, top) is a little confusing to me.



In [4]:

    
from shapely.geometry import Point

def point_to_bounds(point):
    """
    Convert a point to a bounding box
    
    It makes sense to represent points as an x,y pair, but RTree only operates
    on bounding boxes. Convert the point to a bounding box where left == right
    and top == bottom.

    """
    return (point[0], point[1], point[0], point[1])

def get_community_area(point, ca_idx, ca_shapes):
    areas = []
    for n in ca_idx.intersection(point_to_bounds(point), objects=True):
        ca_number = int(n.object['area_num_1'])
        ca_shape = ca_shapes[ca_number]
        if ca_shape.contains(Point(*point)):
            areas.append(n.object)
    return areas
        
# Turkey Chop is a restaurant that is most definitely in Humboldt Park
# Let's use it to spot-check our index
turkey_chop_coords = [-87.7141142377237, 41.8955710581678]

turkey_chop_ca = get_community_area(turkey_chop_coords, communty_area_index, community_area_shapes)
assert turkey_chop_ca[0]['community'] == "HUMBOLDT PARK"

Now, let's get some data from NewsroomDB, the Tribune's internal database of homicides and shootings



In [5]:

    
import os
import requests

# Some constants
NEWSROOMDB_URL = os.environ['NEWSROOMDB_URL']

# A big object to hold all our data between steps
data = {}

def get_table_url(table_name, base_url=NEWSROOMDB_URL):
    return '{}table/json/{}'.format(base_url, table_name)

def get_table_data(table_name):
    url = get_table_url(table_name)
    
    try:
        r = requests.get(url)
        return r.json()
    except:
        print("Request failed. Probably because the response is huge.  We should fix this.")
        return get_table_data(table_name)

data['shooting_victims'] = get_table_data('shootings')
print("Loaded {} shooting victims".format(len(data['shooting_victims'])))

data['homicides'] = get_table_data('homicides')
print("Loaded {} homicides".format(len(data['homicides'])))









    



Request failed. Probably because the response is huge.  We should fix this.
Request failed. Probably because the response is huge.  We should fix this.
Loaded 11586 shooting victims
Loaded 1542 homicides

Let's create PANDAS dataframes out of the loaded data



In [6]:

    
import pandas as pd
import numpy as np

data['shooting_victims_df'] = pd.DataFrame(data['shooting_victims'])
data['homicides_df'] = pd.DataFrame(data['homicides'])

Parse the date fields into Python date objects for easier analysis and make separate month and year columns to make grouping easier.



In [7]:

    
from datetime import datetime

def parse_date(s):
    try:
        return datetime.strptime(s, '%Y-%m-%d').date()
    except ValueError:
        return None
    
data['shooting_victims_df']['Date'] = data['shooting_victims_df']['Date'].apply(parse_date)
data['shooting_victims_df']['month'] = data['shooting_victims_df']['Date'].apply(lambda x: x.month if x else None)
data['shooting_victims_df']['year'] = data['shooting_victims_df']['Date'].apply(lambda x: x.year if x else None)

We'll start with shootings. Assign each shooting to a community area using the index we built earlier.



In [8]:

    
import pprint
import re

def parse_coordinates(coordinate_str):
    """Convert a lat, lng string to a pair of lng, lat floats"""
    lat, lng = [float(c) for c in re.sub(r'[\(\) ]', '', coordinate_str).split(',')]
    return lng, lat

shooting_victim_community_areas = {}

for victim in data['shooting_victims']:
    try:
        coords = parse_coordinates(victim['Geocode Override'])
    except ValueError:
        shooting_victim_community_areas[victim['_id']] = '__invalid__'
        continue
        
    ca = get_community_area(coords, communty_area_index, community_area_shapes)
    
    if len(ca) == 0:
        shooting_victim_community_areas[victim['_id']] = '__invalid__'
        print("No community area found for record with coordinates {}".format(coords))
    elif len(ca) > 1:
        raise ValueError("Multiple community areas found for record with coordinates {}".format(coords))
    else:
        shooting_victim_community_areas[victim['_id']] = ca[0]['community']
        
data['shooting_victim_community_areas'] = pd.DataFrame([{'_id': k, 'community': v} for k, v in shooting_victim_community_areas.items()])









    



No community area found for record with coordinates (-87.742872, 41.762969)
No community area found for record with coordinates (-87.742872, 41.762969)
No community area found for record with coordinates (-87.690307, 41.730243)
No community area found for record with coordinates (-87.651214, 41.511413)
No community area found for record with coordinates (-87.812812, 41.911125)
No community area found for record with coordinates (-87.930351, 41.958801)
No community area found for record with coordinates (-87.682444, 41.730165)
No community area found for record with coordinates (-87.652854, 41.508438)
No community area found for record with coordinates (-87.627502, 41.504604)
No community area found for record with coordinates (-87.700546, 42.019557)
No community area found for record with coordinates (-84.5535506308079, 41.6678441315889)
No community area found for record with coordinates (-87.700546, 42.019557)
No community area found for record with coordinates (-95.9222953766584, 35.9909527748823)
No community area found for record with coordinates (-87.762807789495, 41.8110412403273)
No community area found for record with coordinates (-87.81238630414009, 41.95269003510475)
No community area found for record with coordinates (-97.94587723910809, 35.53741604089737)
No community area found for record with coordinates (-87.85710543394089, 42.867526486516)
No community area found for record with coordinates (-87.85710543394089, 42.867526486516)
No community area found for record with coordinates (-117.07955932617188, 32.69026184082031)
No community area found for record with coordinates (-88.18666309118271, 41.70995280146599)
No community area found for record with coordinates (-118.30857849121094, 33.802249908447266)
No community area found for record with coordinates (-89.16730619966984, 45.15423908829689)
No community area found for record with coordinates (-87.87754821777344, 42.09328079223633)
No community area found for record with coordinates (-119.80324544012547, 39.53507088124752)

Join the community area to the shooting victims data



In [9]:

    
data['shooting_victims_df__with_ca'] = data['shooting_victims_df'].merge(
    data['shooting_victim_community_areas'],
    how='left',
    on='_id')

And count the victims by community area, year and month



In [32]:

    
data['shooting_victims_by_ca'] = pd.DataFrame(data['shooting_victims_df__with_ca'].groupby(['community', 'year', 'month']).size())

Let's just look at March 2016 shooting victims



In [33]:

    
df = data['shooting_victims_by_ca']
df[(df.index.get_level_values('year') == 2016) & (df.index.get_level_values('month') == 3)].sort_values(by=0, ascending=False)









    Out[33]:






  
    
      
      
      
      0
    
    
      community
      year
      month
      
    
  
  
    
      AUSTIN
      2016.0
      3.0
      36
    
    
      HUMBOLDT PARK
      2016.0
      3.0
      27
    
    
      WEST ENGLEWOOD
      2016.0
      3.0
      23
    
    
      NORTH LAWNDALE
      2016.0
      3.0
      18
    
    
      WEST GARFIELD PARK
      2016.0
      3.0
      15
    
    
      EAST GARFIELD PARK
      2016.0
      3.0
      13
    
    
      NEW CITY
      2016.0
      3.0
      12
    
    
      AUBURN GRESHAM
      2016.0
      3.0
      11
    
    
      ENGLEWOOD
      2016.0
      3.0
      11
    
    
      SOUTH LAWNDALE
      2016.0
      3.0
      9
    
    
      ROSELAND
      2016.0
      3.0
      9
    
    
      GREATER GRAND CROSSING
      2016.0
      3.0
      8
    
    
      CHICAGO LAWN
      2016.0
      3.0
      8
    
    
      WASHINGTON HEIGHTS
      2016.0
      3.0
      6
    
    
      WEST PULLMAN
      2016.0
      3.0
      6
    
    
      WOODLAWN
      2016.0
      3.0
      6
    
    
      CHATHAM
      2016.0
      3.0
      6
    
    
      SOUTH DEERING
      2016.0
      3.0
      5
    
    
      NEAR WEST SIDE
      2016.0
      3.0
      5
    
    
      SOUTH SHORE
      2016.0
      3.0
      4
    
    
      GRAND BOULEVARD
      2016.0
      3.0
      4
    
    
      WEST TOWN
      2016.0
      3.0
      4
    
    
      MORGAN PARK
      2016.0
      3.0
      4
    
    
      BELMONT CRAGIN
      2016.0
      3.0
      4
    
    
      __invalid__
      2016.0
      3.0
      4
    
    
      UPTOWN
      2016.0
      3.0
      3
    
    
      ROGERS PARK
      2016.0
      3.0
      3
    
    
      ALBANY PARK
      2016.0
      3.0
      3
    
    
      NEAR NORTH SIDE
      2016.0
      3.0
      3
    
    
      CALUMET HEIGHTS
      2016.0
      3.0
      3
    
    
      IRVING PARK
      2016.0
      3.0
      2
    
    
      AVONDALE
      2016.0
      3.0
      2
    
    
      BRIGHTON PARK
      2016.0
      3.0
      2
    
    
      WASHINGTON PARK
      2016.0
      3.0
      2
    
    
      EAST SIDE
      2016.0
      3.0
      2
    
    
      HERMOSA
      2016.0
      3.0
      2
    
    
      GAGE PARK
      2016.0
      3.0
      2
    
    
      LOWER WEST SIDE
      2016.0
      3.0
      2
    
    
      LOOP
      2016.0
      3.0
      1
    
    
      SOUTH CHICAGO
      2016.0
      3.0
      1
    
    
      NORWOOD PARK
      2016.0
      3.0
      1
    
    
      MCKINLEY PARK
      2016.0
      3.0
      1
    
    
      DOUGLAS
      2016.0
      3.0
      1
    
    
      RIVERDALE
      2016.0
      3.0
      1
    
    
      WEST LAWN
      2016.0
      3.0
      1
    
    
      PULLMAN
      2016.0
      3.0
      1
    
    
      WEST RIDGE
      2016.0
      3.0
      1
    
    
      BRIDGEPORT
      2016.0
      3.0
      1
    
    
      OAKLAND
      2016.0
      3.0
      1
    
    
      ASHBURN
      2016.0
      3.0
      1

How did March Humboldt Park shootings look over time?



In [34]:

    
df = data['shooting_victims_by_ca']
df[(df.index.get_level_values('community') == "HUMBOLDT PARK") & (df.index.get_level_values('month') == 3)]









    Out[34]:






  
    
      
      
      
      0
    
    
      community
      year
      month
      
    
  
  
    
      HUMBOLDT PARK
      2012.0
      3.0
      8
    
    
      2013.0
      3.0
      5
    
    
      2014.0
      3.0
      6
    
    
      2015.0
      3.0
      6
    
    
      2016.0
      3.0
      27



In [ ]:

			0
community	year	month
AUSTIN	2016.0	3.0	36
HUMBOLDT PARK	2016.0	3.0	27
WEST ENGLEWOOD	2016.0	3.0	23
NORTH LAWNDALE	2016.0	3.0	18
WEST GARFIELD PARK	2016.0	3.0	15
EAST GARFIELD PARK	2016.0	3.0	13
NEW CITY	2016.0	3.0	12
AUBURN GRESHAM	2016.0	3.0	11
ENGLEWOOD	2016.0	3.0	11
SOUTH LAWNDALE	2016.0	3.0	9
ROSELAND	2016.0	3.0	9
GREATER GRAND CROSSING	2016.0	3.0	8
CHICAGO LAWN	2016.0	3.0	8
WASHINGTON HEIGHTS	2016.0	3.0	6
WEST PULLMAN	2016.0	3.0	6
WOODLAWN	2016.0	3.0	6
CHATHAM	2016.0	3.0	6
SOUTH DEERING	2016.0	3.0	5
NEAR WEST SIDE	2016.0	3.0	5
SOUTH SHORE	2016.0	3.0	4
GRAND BOULEVARD	2016.0	3.0	4
WEST TOWN	2016.0	3.0	4
MORGAN PARK	2016.0	3.0	4
BELMONT CRAGIN	2016.0	3.0	4
__invalid__	2016.0	3.0	4
UPTOWN	2016.0	3.0	3
ROGERS PARK	2016.0	3.0	3
ALBANY PARK	2016.0	3.0	3
NEAR NORTH SIDE	2016.0	3.0	3
CALUMET HEIGHTS	2016.0	3.0	3
IRVING PARK	2016.0	3.0	2
AVONDALE	2016.0	3.0	2
BRIGHTON PARK	2016.0	3.0	2
WASHINGTON PARK	2016.0	3.0	2
EAST SIDE	2016.0	3.0	2
HERMOSA	2016.0	3.0	2
GAGE PARK	2016.0	3.0	2
LOWER WEST SIDE	2016.0	3.0	2
LOOP	2016.0	3.0	1
SOUTH CHICAGO	2016.0	3.0	1
NORWOOD PARK	2016.0	3.0	1
MCKINLEY PARK	2016.0	3.0	1
DOUGLAS	2016.0	3.0	1
RIVERDALE	2016.0	3.0	1
WEST LAWN	2016.0	3.0	1
PULLMAN	2016.0	3.0	1
WEST RIDGE	2016.0	3.0	1
BRIDGEPORT	2016.0	3.0	1
OAKLAND	2016.0	3.0	1
ASHBURN	2016.0	3.0	1